Search CORE

5 research outputs found

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System

Optimally Streaming Greedy Regular Expression Parsing

Author: Bugge Grathwohl
Fritz Henglein
Niels Bjørn
Ulrik Terp Rasmussen
Publication venue
Publication date: 24/04/2020
Field of study

Abstract. We study the problem of streaming regular expression parsing: Given a regular expression and an input stream of symbols, how to output a serialized syntax tree representation as an output stream during input stream processing. We show that optimally streaming regular expression parsing, outputting bits of the output as early as is semantically possible for any regular expression of size m and any input string of length n, can be performed in time O(2 m log m + mn) on a unit-cost random-access machine. This is for the wide-spread greedy disambiguation strategy for choosing parse trees of grammatically ambiguous regular expressions. In particular, for a fixed regular expression, the algorithm's run-time scales linearly with the input string length. The exponential is due to the need for preprocessing the regular expression to analyze state coverage of its associated NFA, a PSPACE-hard problem, and tabulating all reachable ordered sets of NFA-states. Previous regular expression parsing algorithms operate in multiple phases, always requiring processing or storing the whole input string before outputting the first bit of output, not only for those regular expressions and input prefixes where reading to the end of the input is strictly necessary

CiteSeerX

Structural logical relations with case analysis and equality reasoning

Author: Filinski Andrzej
Rasmussen Ulrik Terp
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Copenhagen University Research Information System

Two-pass greedy regular expression parsing

Author: Grathwohl Niels Bjørn Bugge
Henglein Fritz
Nielsen Lasse
Rasmussen Ulrik Terp
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Copenhagen University Research Information System

Kleenex:compiling nondeterministic transducers to deterministic streaming transducers

Author: Grathwohl Niels Bjørn Bugge
Henglein Fritz
Rasmussen Ulrik Terp
Søholm Kristoffer Aalund
Torholm Sebastian Paaske
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Copenhagen University Research Information System